Mining Non-Redundant Sets of Generalizing Patterns from Sequence Databases

نویسندگان

  • Niek Tax
  • Marlon Dumas
چکیده

Sequential pattern mining techniques extract patterns corresponding to frequent subsequences from a sequence database. A practical limitation of these techniques is that they overload the user with too many patterns. Local Process Model (LPM) mining is an alternative approach coming from the field of process mining. While in traditional sequential pattern mining, a pattern describes one subsequence, an LPM captures a set of subsequences. Also, while traditional sequential patterns only match subsequences that are observed in the sequence database, an LPM may capture subsequences that are not explicitly observed, but that are related to observed subsequences. In other words, LPMs generalize the behavior observed in the sequence database. These properties make it possible for a set of LPMs to cover the behavior of a much larger set of sequential patterns. Yet, existing LPM mining techniques still suffer from the pattern explosion problem because they produce sets of redundant LPMs. In this paper, we propose several heuristics to mine a set of non-redundant LPMs either from a set of redundant LPMs or from a set of sequential patterns. We empirically compare the proposed heuristics between them and against existing (local) process mining techniques in terms of coverage, precision, and complexity of the produced sets of LPMs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...

متن کامل

Enhancing Spatial Association Rule Mining in Geographic Databases

The association rule mining technique emerged with the objective to find novel, useful, and previously unknown associations from transactional databases, and a large amount of association rule mining algorithms have been proposed in the last decade. Their main drawback, which is a well known problem, is the generation of large amounts of frequent patterns and association rules. In geographic da...

متن کامل

ApproxMAP: Approximate Mining of Consensus Sequential Patterns

Sequential pattern mining is an important data mining task with broad applications. However, conventional methods may meet inherent difficulties in mining databases with long sequences and noise. They may generate a huge number of short and trivial patterns but fail to find interesting patterns approximately shared by many sequences. To attack these problems, in this paper, we propose the theme...

متن کامل

Characterising the Di↵erence and the Norm between Sequence Databases

In pattern set mining we are after a small set of patterns that together are characteristic for the data at hand. In this paper we consider the problem of characterizing not one, but a set of sequence databases, such as a collection of articles or the chapters of a book. Our main objective is to find a set of patterns that captures the individual features of each database, while also finding sh...

متن کامل

Discovering Non-Redundant Association Rules using MinMax Approximation Rules

Dept. Of Comp. Sci. & Eng. Vaagdevi college of Eng. Warangal, India [email protected] Abstract Frequent pattern mining is an important area of data mining used to generate the Association Rules. The extracted Frequent Patterns quality is a big concern, as it generates huge sets of rules and many of them are redundant. Mining Non-Redundant Frequent patterns is a big concern in the area of Ass...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1712.04159  شماره 

صفحات  -

تاریخ انتشار 2017